正文

pandas documentation — pandas 1.5.2 documentation (pydata.org)

36 - Introduction to Pandas - Data reading and handling

绘制直方图

import pandas as pd

df = pd.read_csv('images/grains/grain_measurements.csv')
df['Area'].plot(kind='hist', title='Area', bins=50)

<AxesSubplot:title={'center':'Area'}, ylabel='Frequency'>

png

创建 DataFrame, 修改 index 和 columns

data = [[10, 200, 60],
        [12, 155, 45],
        [9, 50, -45.],
        [16, 240, 90]]

df = pd.DataFrame(data, index=[1, 2, 3, 4], columns=['Area', 'Intensity', 'Orientation'])
df

	Area	Intensity	Orientation
1	10	200	60.0
2	12	155	45.0
3	9	50	-45.0
4	16	240	90.0

png

查看各行缺失信息

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  100 non-null    object 
 1   Image       100 non-null    object 
 2   Manual      94 non-null     float64
 3   Manual2     3 non-null      float64
 4   Auto_th_2   100 non-null    int64  
 5   Auto_th_3   100 non-null    int64  
 6   Auto_th_4   100 non-null    int64  
dtypes: float64(2), int64(3), object(2)
memory usage: 5.6+ KB

查看表格行列数

1	`df.shape`

(100, 7)

查看整个表格

df

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
...	...	...	...	...	...	...	...
95	Set4	Image96	106.0	NaN	75	112	98
96	Set4	Image97	80.0	NaN	66	80	88
97	Set4	Image98	92.0	NaN	73	93	95
98	Set4	Image99	116.0	NaN	101	115	93
99	Set4	Image100	99.0	NaN	77	106	102

100 rows × 7 columns

查看表格前 7 行

1	`df.head(7)`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93

查看表格后 7 行

1	`df.tail(7)`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
93	Set4	Image94	81.0	NaN	65	90	70
94	Set4	Image95	NaN	NaN	104	122	88
95	Set4	Image96	106.0	NaN	75	112	98
96	Set4	Image97	80.0	NaN	66	80	88
97	Set4	Image98	92.0	NaN	73	93	95
98	Set4	Image99	116.0	NaN	101	115	93
99	Set4	Image100	99.0	NaN	77	106	102

将某一列改为 index

1 2	`df1 = df.set_index('Image') df1.head()`

	Unnamed: 0	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
Image
Image1	Set1	92.0	93.0	70	87	82
Image2	Set1	87.0	83.0	60	85	83
Image3	Set1	104.0	98.0	74	99	94
Image4	Set1	99.0	NaN	73	101	109
Image5	Set1	89.0	NaN	59	90	67

查看 columns（列）名

1	`df1.columns`

Index(['Unnamed: 0', 'Manual', 'Manual2', 'Auto_th_2', 'Auto_th_3',
       'Auto_th_4'],
      dtype='object')

去重

1	`df['Unnamed: 0'].unique()`

array(['Set1', 'Set2', 'Set3', 'Set4'], dtype=object)

修改列名

1 2	`df1 = df.rename(columns={'Unnamed: 0': 'Image_set'}) df1.columns`

Index(['Image_set', 'Image', 'Manual', 'Manual2', 'Auto_th_2', 'Auto_th_3',
       'Auto_th_4'],
      dtype='object')

显示数据类型

1	`df.dtypes`

Unnamed: 0     object
Image          object
Manual        float64
Manual2       float64
Auto_th_2       int64
Auto_th_3       int64
Auto_th_4       int64
dtype: object

显示统计数据

1	`df.describe()`

	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
count	94.000000	3.000000	100.000000	100.000000	100.000000
mean	100.021277	91.333333	76.370000	97.580000	93.210000
std	11.285140	7.637626	11.971055	12.327337	14.128769
min	80.000000	83.000000	55.000000	71.000000	63.000000
25%	90.250000	88.000000	67.750000	89.500000	83.750000
50%	101.000000	93.000000	74.500000	98.500000	93.000000
75%	108.000000	95.500000	85.000000	106.000000	103.250000
max	120.000000	98.000000	109.000000	124.000000	129.000000

37 - Introduction to Pandas - Data Manipulation

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df.head()

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67

删除某列

1 2	`df1 = df.drop('Manual2', axis=1) df1.head()`

	Unnamed: 0	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	70	87	82
1	Set1	Image2	87.0	60	85	83
2	Set1	Image3	104.0	74	99	94
3	Set1	Image4	99.0	73	101	109
4	Set1	Image5	89.0	59	90	67

删除多列

1 2	`df2 = df.drop(['Manual2', 'Auto_th_2'], axis=1) df2.head()`

	Unnamed: 0	Image	Manual	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	87	82
1	Set1	Image2	87.0	85	83
2	Set1	Image3	104.0	99	94
3	Set1	Image4	99.0	101	109
4	Set1	Image5	89.0	90	67

添加并填充一列

1 2	`df['Date'] = '2019-06-24' df.head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
0	Set1	Image1	92.0	93.0	70	87	82	2019-06-24
1	Set1	Image2	87.0	83.0	60	85	83	2019-06-24
2	Set1	Image3	104.0	98.0	74	99	94	2019-06-24
3	Set1	Image4	99.0	NaN	73	101	109	2019-06-24
4	Set1	Image5	89.0	NaN	59	90	67	2019-06-24

1	`df.dtypes`

Unnamed: 0     object
Image          object
Manual        float64
Manual2       float64
Auto_th_2       int64
Auto_th_3       int64
Auto_th_4       int64
Date           object
dtype: object

将字符串转换成时间

1 2	`df['Date'] = pd.to_datetime('2019-06-24') df.head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
0	Set1	Image1	92.0	93.0	70	87	82	2019-06-24
1	Set1	Image2	87.0	83.0	60	85	83	2019-06-24
2	Set1	Image3	104.0	98.0	74	99	94	2019-06-24
3	Set1	Image4	99.0	NaN	73	101	109	2019-06-24
4	Set1	Image5	89.0	NaN	59	90	67	2019-06-24

1	`df.dtypes`

Unnamed: 0            object
Image                 object
Manual               float64
Manual2              float64
Auto_th_2              int64
Auto_th_3              int64
Auto_th_4              int64
Date          datetime64[ns]
dtype: object

保存成 .csv 文件

1	`df.to_csv('data/manual_vs_auto_updated.csv')`

删除某行

1 2	`df1 = df.drop(df.index[1]) df1.head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
0	Set1	Image1	92.0	93.0	70	87	82	2019-06-24
2	Set1	Image3	104.0	98.0	74	99	94	2019-06-24
3	Set1	Image4	99.0	NaN	73	101	109	2019-06-24
4	Set1	Image5	89.0	NaN	59	90	67	2019-06-24
5	Set1	Image6	115.0	NaN	82	124	105	2019-06-24

删除前 10 行

1 2	`df1 = df.iloc[10:,] df1.head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
10	Set1	Image11	91.0	NaN	61	87	77	2019-06-24
11	Set1	Image12	119.0	NaN	79	105	111	2019-06-24
12	Set1	Image13	NaN	NaN	65	90	84	2019-06-24
13	Set1	Image14	117.0	NaN	94	115	105	2019-06-24
14	Set1	Image15	91.0	NaN	66	99	70	2019-06-24

选取某些行

1 2	`df1 = df[df['Unnamed: 0'] != 'Set1'] df1.head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	Date
25	Set2	Image26	102.0	NaN	85	103	105	2019-06-24
26	Set2	Image27	93.0	NaN	76	84	98	2019-06-24
27	Set2	Image28	83.0	NaN	62	71	87	2019-06-24
28	Set2	Image29	110.0	NaN	92	117	85	2019-06-24
29	Set2	Image30	89.0	NaN	70	96	81	2019-06-24

38 - Introduction to Pandas - Data Sorting

排序

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df2 = df.sort_values('Manual', ascending=True)  # ascending: 升序

选取某行/列

1	`df2[['Manual', 'Auto_th_2']]`

	Manual	Auto_th_2
34	80.0	58
96	80.0	66
93	81.0	65
66	81.0	65
44	82.0	67
...	...	...
32	NaN	66
59	NaN	74
79	NaN	69
82	NaN	64
94	NaN	104

100 rows × 2 columns

1	`df[20: 30]`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
20	Set1	Image21	89.0	NaN	65	94	86
21	Set1	Image22	88.0	NaN	66	96	83
22	Set1	Image23	106.0	NaN	71	112	105
23	Set1	Image24	107.0	NaN	92	91	111
24	Set1	Image25	108.0	NaN	93	113	115
25	Set2	Image26	102.0	NaN	85	103	105
26	Set2	Image27	93.0	NaN	76	84	98
27	Set2	Image28	83.0	NaN	62	71	87
28	Set2	Image29	110.0	NaN	92	117	85
29	Set2	Image30	89.0	NaN	70	96	81

loc 方法是通过行、列的名称或者标签来寻找我们需要的值。

Pandas 读取某列、某行数据——loc、iloc 用法总结_子木同学的博客-CSDN 博客_pandas iloc

1	`df.loc[20: 30, ['Manual', 'Auto_th_2']]`

	Manual	Auto_th_2
20	89.0	65
21	88.0	66
22	106.0	71
23	107.0	92
24	108.0	93
25	102.0	85
26	93.0	76
27	83.0	62
28	110.0	92
29	89.0	70
30	115.0	77

1 2	`set2_df = df[df['Unnamed: 0'] == 'Set2'] set2_df.head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
25	Set2	Image26	102.0	NaN	85	103	105
26	Set2	Image27	93.0	NaN	76	84	98
27	Set2	Image28	83.0	NaN	62	71	87
28	Set2	Image29	110.0	NaN	92	117	85
29	Set2	Image30	89.0	NaN	70	96	81

选取最大值

1	`max(set2_df['Manual'])`

120.0

根据条件选取某些值

1	`df['Manual'] > 100`

0     False
1     False
2      True
3     False
4     False
      ...  
95     True
96    False
97    False
98     True
99    False
Name: Manual, Length: 100, dtype: bool

1	`df[df['Manual'] > 100].head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
2	Set1	Image3	104.0	98.0	74	99	94
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112

复合条件

1	`df[(df['Manual'] > 100) & (df['Auto_th_2'] < 100)].head()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
2	Set1	Image3	104.0	98.0	74	99	94
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112

遍历某些行/列

1
2
3

for index, row in df.iterrows():
    average_auto = (row['Auto_th_2'] + row['Auto_th_3'] + row['Auto_th_4']) / 3
    print(round(average_auto), row['Manual'])

39 - Introduction to Pandas - Grouping Data

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df = df.rename(columns = {'Unnamed: 0': 'Image_set'})
df.head()

	Image_set	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67

1 2	`df = df.drop('Manual2', axis=1) df.head()`

	Image_set	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	70	87	82
1	Set1	Image2	87.0	60	85	83
2	Set1	Image3	104.0	74	99	94
3	Set1	Image4	99.0	73	101	109
4	Set1	Image5	89.0	59	90	67

以 Image_set 为分组做统计

1
2
3

group_by_file = df.groupby(by=['Image_set'])
set_data_count = group_by_file.count()
set_data_avg = group_by_file.mean()

1	`set_data_count`

	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
Image_set
Set1	25	24	25	25	25
Set2	25	24	25	25	25
Set3	25	24	25	25	25
Set4	25	22	25	25	25

1	`set_data_avg`

	Manual	Auto_th_2	Auto_th_3	Auto_th_4
Image_set
Set1	100.666667	72.84	98.04	92.36
Set2	98.666667	75.40	98.00	93.44
Set3	100.000000	78.48	95.52	94.40
Set4	100.818182	78.76	98.76	92.64

统计关联性（手动与自动阈值）

1	`df['Manual'].corr(df['Auto_th_2'])`

0.7381233054217538

40 - Introduction to Pandas - Dealing with missing -null- data

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df.head(8)

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88

查看缺失值

1	`df.isnull()`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False
3	False	False	False	True	False	False	False
4	False	False	False	True	False	False	False
...	...	...	...	...	...	...	...
95	False	False	False	True	False	False	False
96	False	False	False	True	False	False	False
97	False	False	False	True	False	False	False
98	False	False	False	True	False	False	False
99	False	False	False	True	False	False	False

100 rows × 7 columns

1	`df.isnull().sum()`

Unnamed: 0     0
Image          0
Manual         6
Manual2       97
Auto_th_2      0
Auto_th_3      0
Auto_th_4      0
dtype: int64

删除缺失值

1
2
3

df = df.drop('Manual2', axis=1)
df2 = df.dropna()
df2.head(10)

	Unnamed: 0	Image	Manual	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	70	87	82
1	Set1	Image2	87.0	60	85	83
2	Set1	Image3	104.0	74	99	94
3	Set1	Image4	99.0	73	101	109
4	Set1	Image5	89.0	59	90	67
5	Set1	Image6	115.0	82	124	105
6	Set1	Image7	102.0	68	103	93
7	Set1	Image8	117.0	77	122	88
8	Set1	Image9	104.0	88	99	112
9	Set1	Image10	103.0	69	98	94

1 2	`df = pd.read_csv('data/manual_vs_auto.csv') df.describe()`

	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
count	94.000000	3.000000	100.000000	100.000000	100.000000
mean	100.021277	91.333333	76.370000	97.580000	93.210000
std	11.285140	7.637626	11.971055	12.327337	14.128769
min	80.000000	83.000000	55.000000	71.000000	63.000000
25%	90.250000	88.000000	67.750000	89.500000	83.750000
50%	101.000000	93.000000	74.500000	98.500000	93.000000
75%	108.000000	95.500000	85.000000	106.000000	103.250000
max	120.000000	98.000000	109.000000	124.000000	129.000000

填充缺失值

1 2	`df['Manual'].fillna(100, inplace=True) df.head(10)`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112
9	Set1	Image10	103.0	NaN	69	98	94

使用平均值填充缺失值

import numpy as np

df = pd.read_csv('data/manual_vs_auto.csv')
df['Manual'] = df.apply(
    lambda row: (round((row['Auto_th_2'] + row['Auto_th_3'] + row['Auto_th_3']) / 3))  # 平均值
    if np.isnan(row['Manual'])  # 如果是缺失值的话
    else row['Manual'], axis=1)  # 填充在 Manual 列上

1	`df.head(10)`

	Unnamed: 0	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67
5	Set1	Image6	115.0	NaN	82	124	105
6	Set1	Image7	102.0	NaN	68	103	93
7	Set1	Image8	117.0	NaN	77	122	88
8	Set1	Image9	104.0	NaN	88	99	112
9	Set1	Image10	103.0	NaN	69	98	94

41 - Introduction to Pandas - Plotting

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df = df.rename(columns={'Unnamed: 0': 'Image_set'})
df.head()

	Image_set	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4
0	Set1	Image1	92.0	93.0	70	87	82
1	Set1	Image2	87.0	83.0	60	85	83
2	Set1	Image3	104.0	98.0	74	99	94
3	Set1	Image4	99.0	NaN	73	101	109
4	Set1	Image5	89.0	NaN	59	90	67

绘制折线图

1	`df['Manual'].plot()`

<AxesSubplot:>

png

绘制直方图

1 2	`# 类型 hist，分组 30，标题 Manual Count，图像大小 12 * 10 df['Manual'].plot(kind='hist', bins=30, title='Manual Count', figsize=(12, 10))`

<AxesSubplot:title={'center':'Manual Count'}, ylabel='Frequency'>

png

详解 pandas 中的 rolling

1	`df['Manual'].rolling(3).mean().plot()`

<AxesSubplot:>

png

绘制 box 图，判断平均数与中位数的关系 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.boxplot.html

1	`df['Manual'].describe()`

count     94.000000
mean     100.021277
std       11.285140
min       80.000000
25%       90.250000
50%      101.000000
75%      108.000000
max      120.000000
Name: Manual, dtype: float64

1	`df['Manual'].plot(kind='box', figsize=(8, 6))`

<AxesSubplot:>

png

散点图

1	`df.plot(kind='scatter', x='Manual', y='Auto_th_2', title='Manual vs Auto 2')`

<AxesSubplot:title={'center':'Manual vs Auto 2'}, xlabel='Manual', ylabel='Auto_th_2'>

png

pd.apply(): Pandas 数据处理(五) — apply() 方法介绍！

def cell_count(x):
    if x <= 100.0:
        return 'low'
    else:
        return 'high'

1 2	`df['cell_count_index'] = df['Manual'].apply(cell_count) df.head()`

	Image_set	Image	Manual	Manual2	Auto_th_2	Auto_th_3	Auto_th_4	cell_count_index
0	Set1	Image1	92.0	93.0	70	87	82	low
1	Set1	Image2	87.0	83.0	60	85	83	low
2	Set1	Image3	104.0	98.0	74	99	94	high
3	Set1	Image4	99.0	NaN	73	101	109	low
4	Set1	Image5	89.0	NaN	59	90	67	low

1	`df.to_csv('data/manual_vs_auto2.csv')`

1	`df.boxplot(column='Manual', by='cell_count_index')`

<AxesSubplot:title={'center':'Manual'}, xlabel='cell_count_index'>

png

42 - Introduction to Seaborn Plotting in Python

Seaborn

import pandas as pd

df = pd.read_csv('data/manual_vs_auto.csv')
df['Manual'].fillna(100, inplace=True)
df = df.rename(columns={'Unnamed: 0': 'Image_Set'})

1
2
3

import seaborn as sns

sns.distplot(df['Manual'])

C:\Users\gzjzx\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)



<AxesSubplot:xlabel='Manual', ylabel='Density'>

png

sns.kdeplot()核密度估计图

sns.kdeplot(df['Manual'], shade=True)
sns.kdeplot(df['Auto_th_2'], shade=True)
sns.kdeplot(df['Auto_th_3'], shade=True)
sns.kdeplot(df['Auto_th_4'], shade=True)

<AxesSubplot:xlabel='Manual', ylabel='Density'>

png

sns.jointplot() 双变量关系图

1	`sns.jointplot(x='Manual', y='Auto_th_2', data=df, kind='kde')`

<seaborn.axisgrid.JointGrid at 0x212f9ad23d0>

png

sns.pairplot() 用来展示两两特征之间的关系

1	`sns.pairplot(df, x_vars=['Auto_th_2', 'Auto_th_3', 'Auto_th_4'], y_vars='Manual', height=6)`

<seaborn.axisgrid.PairGrid at 0x212f9bd0fd0>

png

sns.lmplot() 展示线性关系

1	`sns.lmplot(x='Manual', y='Auto_th_2', data=df, order=1, hue='Image_Set')`

<seaborn.axisgrid.FacetGrid at 0x212fa457f70>

png

from scipy import stats

slope, intercept, r_value, p_value, std_err = stats.linregress(df['Manual'], df['Auto_th_2'])
slope, intercept, r_value, p_value, std_err

(0.772483189743971,
 -0.8937686381919718,
 0.7058094587729904,
 2.396963973676236e-16,
 0.07831918096230937)

sns.swarmplot() 分簇散点图

df = pd.read_csv('data/manual_vs_auto2.csv')
df['Manual'].fillna(100, inplace=True)
df = df.rename(columns={'Unnamed: 0': 'Image_Set'})

sns.swarmplot(x='Image_Set', y='Manual', data=df, hue='cell_count_index', dodge=True)

<AxesSubplot:xlabel='Image_Set', ylabel='Manual'>

png

sns.heatmap() 热图

1 2	`corr = df.loc[:,df.dtypes == 'int64'].corr() #Correlates all int64 columns sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, cmap=sns.diverging_palette(220, 10, as_cmap=True))`

<AxesSubplot:>

png